{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create entry points to spark\n",
    "try:\n",
    "    sc.stop()\n",
    "except:\n",
    "    pass\n",
    "from pyspark import SparkContext, SparkConf\n",
    "from pyspark.sql import SparkSession\n",
    "sc=SparkContext()\n",
    "spark = SparkSession(sparkContext=sc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|            model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|        Mazda RX4|21.0|  6|160.0|110| 3.9| 2.62|16.46|  0|  1|   4|   4|\n",
      "|    Mazda RX4 Wag|21.0|  6|160.0|110| 3.9|2.875|17.02|  0|  1|   4|   4|\n",
      "|       Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|\n",
      "|   Hornet 4 Drive|21.4|  6|258.0|110|3.08|3.215|19.44|  1|  0|   3|   1|\n",
      "|Hornet Sportabout|18.7|  8|360.0|175|3.15| 3.44|17.02|  0|  0|   3|   2|\n",
      "+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "only showing top 5 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars = spark.read.csv('../../data/mtcars.csv', inferSchema=True, header=True)\n",
    "# correct first column name\n",
    "mtcars = mtcars.withColumnRenamed('_c0', 'model')\n",
    "mtcars.show(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select Rows by index\n",
    "\n",
    "First, we need to add index to each rows. The **zipWithIndex** function zips the RDD elements with their corresponding index and returns the result as a new element."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(Row(model='Mazda RX4', mpg=21.0, cyl=6, disp=160.0, hp=110, drat=3.9, wt=2.62, qsec=16.46, vs=0, am=1, gear=4, carb=4),\n",
       "  0),\n",
       " (Row(model='Mazda RX4 Wag', mpg=21.0, cyl=6, disp=160.0, hp=110, drat=3.9, wt=2.875, qsec=17.02, vs=0, am=1, gear=4, carb=4),\n",
       "  1),\n",
       " (Row(model='Datsun 710', mpg=22.8, cyl=4, disp=108.0, hp=93, drat=3.85, wt=2.32, qsec=18.61, vs=1, am=1, gear=4, carb=1),\n",
       "  2),\n",
       " (Row(model='Hornet 4 Drive', mpg=21.4, cyl=6, disp=258.0, hp=110, drat=3.08, wt=3.215, qsec=19.44, vs=1, am=0, gear=3, carb=1),\n",
       "  3),\n",
       " (Row(model='Hornet Sportabout', mpg=18.7, cyl=8, disp=360.0, hp=175, drat=3.15, wt=3.44, qsec=17.02, vs=0, am=0, gear=3, carb=2),\n",
       "  4)]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mtcars.rdd.zipWithIndex().take(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can apply the **map** function to modify the structure of each element. Assume **x** is an element from the above RDD object, **x** has two elements: x[0] and x[1]. x[0] is an **Row** object, and x[1] is the index, which is an integer. We want to merge these two values to create a list. And we also want the first element in the list is the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[0, 'Mazda RX4', 21.0, 6, 160.0, 110, 3.9, 2.62, 16.46, 0, 1, 4, 4],\n",
       " [1, 'Mazda RX4 Wag', 21.0, 6, 160.0, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4],\n",
       " [2, 'Datsun 710', 22.8, 4, 108.0, 93, 3.85, 2.32, 18.61, 1, 1, 4, 1],\n",
       " [3, 'Hornet 4 Drive', 21.4, 6, 258.0, 110, 3.08, 3.215, 19.44, 1, 0, 3, 1],\n",
       " [4, 'Hornet Sportabout', 18.7, 8, 360.0, 175, 3.15, 3.44, 17.02, 0, 0, 3, 2]]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mtcars.rdd.zipWithIndex().map(lambda x: [x[1]] + list(x[0])).take(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's add column names and save the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "header = ['index'] + mtcars.columns\n",
    "mtcars_df = mtcars.rdd.zipWithIndex().map(lambda x: [x[1]] + list(x[0])).toDF(header)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|index|            model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|    0|        Mazda RX4|21.0|  6|160.0|110| 3.9| 2.62|16.46|  0|  1|   4|   4|\n",
      "|    1|    Mazda RX4 Wag|21.0|  6|160.0|110| 3.9|2.875|17.02|  0|  1|   4|   4|\n",
      "|    2|       Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|\n",
      "|    3|   Hornet 4 Drive|21.4|  6|258.0|110|3.08|3.215|19.44|  1|  0|   3|   1|\n",
      "|    4|Hornet Sportabout|18.7|  8|360.0|175|3.15| 3.44|17.02|  0|  0|   3|   2|\n",
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "only showing top 5 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars_df.show(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After we obtain the **index column**, we can apply the **pyspark.sql.DataFrame.filter** function to select rows of the DataFrame. The **filter** function takes a **column** of **types.BooleanType** as input."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Select specific rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|index|            model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|    1|    Mazda RX4 Wag|21.0|  6|160.0|110| 3.9|2.875|17.02|  0|  1|   4|   4|\n",
      "|    2|       Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|\n",
      "|    4|Hornet Sportabout|18.7|  8|360.0|175|3.15| 3.44|17.02|  0|  0|   3|   2|\n",
      "|    6|       Duster 360|14.3|  8|360.0|245|3.21| 3.57|15.84|  0|  0|   3|   4|\n",
      "|    9|         Merc 280|19.2|  6|167.6|123|3.92| 3.44| 18.3|  1|  0|   4|   4|\n",
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars_df.filter(mtcars_df.index.isin([1,2,4,6,9])).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Select rows between a range"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+----------+----+---+-----+---+----+----+-----+---+---+----+----+\n",
      "|index|     model| mpg|cyl| disp| hp|drat|  wt| qsec| vs| am|gear|carb|\n",
      "+-----+----------+----+---+-----+---+----+----+-----+---+---+----+----+\n",
      "|    5|   Valiant|18.1|  6|225.0|105|2.76|3.46|20.22|  1|  0|   3|   1|\n",
      "|    6|Duster 360|14.3|  8|360.0|245|3.21|3.57|15.84|  0|  0|   3|   4|\n",
      "|    7| Merc 240D|24.4|  4|146.7| 62|3.69|3.19| 20.0|  1|  0|   4|   2|\n",
      "|    8|  Merc 230|22.8|  4|140.8| 95|3.92|3.15| 22.9|  1|  0|   4|   2|\n",
      "|    9|  Merc 280|19.2|  6|167.6|123|3.92|3.44| 18.3|  1|  0|   4|   4|\n",
      "|   10| Merc 280C|17.8|  6|167.6|123|3.92|3.44| 18.9|  1|  0|   4|   4|\n",
      "+-----+----------+----+---+-----+---+----+----+-----+---+---+----+----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars_df.filter(mtcars_df.index.between(5, 10)).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Select rows by a cutoff index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|index|            model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|    0|        Mazda RX4|21.0|  6|160.0|110| 3.9| 2.62|16.46|  0|  1|   4|   4|\n",
      "|    1|    Mazda RX4 Wag|21.0|  6|160.0|110| 3.9|2.875|17.02|  0|  1|   4|   4|\n",
      "|    2|       Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|\n",
      "|    3|   Hornet 4 Drive|21.4|  6|258.0|110|3.08|3.215|19.44|  1|  0|   3|   1|\n",
      "|    4|Hornet Sportabout|18.7|  8|360.0|175|3.15| 3.44|17.02|  0|  0|   3|   2|\n",
      "|    5|          Valiant|18.1|  6|225.0|105|2.76| 3.46|20.22|  1|  0|   3|   1|\n",
      "|    6|       Duster 360|14.3|  8|360.0|245|3.21| 3.57|15.84|  0|  0|   3|   4|\n",
      "|    7|        Merc 240D|24.4|  4|146.7| 62|3.69| 3.19| 20.0|  1|  0|   4|   2|\n",
      "|    8|         Merc 230|22.8|  4|140.8| 95|3.92| 3.15| 22.9|  1|  0|   4|   2|\n",
      "+-----+-----------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars_df.filter(mtcars_df.index < 9).show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+-------------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|index|              model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----+-------------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|   14| Cadillac Fleetwood|10.4|  8|472.0|205|2.93| 5.25|17.98|  0|  0|   3|   4|\n",
      "|   15|Lincoln Continental|10.4|  8|460.0|215| 3.0|5.424|17.82|  0|  0|   3|   4|\n",
      "|   16|  Chrysler Imperial|14.7|  8|440.0|230|3.23|5.345|17.42|  0|  0|   3|   4|\n",
      "|   17|           Fiat 128|32.4|  4| 78.7| 66|4.08|  2.2|19.47|  1|  1|   4|   1|\n",
      "|   18|        Honda Civic|30.4|  4| 75.7| 52|4.93|1.615|18.52|  1|  1|   4|   2|\n",
      "|   19|     Toyota Corolla|33.9|  4| 71.1| 65|4.22|1.835| 19.9|  1|  1|   4|   1|\n",
      "|   20|      Toyota Corona|21.5|  4|120.1| 97| 3.7|2.465|20.01|  1|  0|   3|   1|\n",
      "|   21|   Dodge Challenger|15.5|  8|318.0|150|2.76| 3.52|16.87|  0|  0|   3|   2|\n",
      "|   22|        AMC Javelin|15.2|  8|304.0|150|3.15|3.435| 17.3|  0|  0|   3|   2|\n",
      "|   23|         Camaro Z28|13.3|  8|350.0|245|3.73| 3.84|15.41|  0|  0|   3|   4|\n",
      "|   24|   Pontiac Firebird|19.2|  8|400.0|175|3.08|3.845|17.05|  0|  0|   3|   2|\n",
      "|   25|          Fiat X1-9|27.3|  4| 79.0| 66|4.08|1.935| 18.9|  1|  1|   4|   1|\n",
      "|   26|      Porsche 914-2|26.0|  4|120.3| 91|4.43| 2.14| 16.7|  0|  1|   5|   2|\n",
      "|   27|       Lotus Europa|30.4|  4| 95.1|113|3.77|1.513| 16.9|  1|  1|   5|   2|\n",
      "|   28|     Ford Pantera L|15.8|  8|351.0|264|4.22| 3.17| 14.5|  0|  1|   5|   4|\n",
      "|   29|       Ferrari Dino|19.7|  6|145.0|175|3.62| 2.77| 15.5|  0|  1|   5|   6|\n",
      "|   30|      Maserati Bora|15.0|  8|301.0|335|3.54| 3.57| 14.6|  0|  1|   5|   8|\n",
      "|   31|         Volvo 142E|21.4|  4|121.0|109|4.11| 2.78| 18.6|  1|  1|   4|   2|\n",
      "+-----+-------------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars_df.filter(mtcars_df.index >= 14).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select rows by logical criteria"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example 1: select rows when **cyl = 4**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+--------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|index|         model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----+--------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|    2|    Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|\n",
      "|    7|     Merc 240D|24.4|  4|146.7| 62|3.69| 3.19| 20.0|  1|  0|   4|   2|\n",
      "|    8|      Merc 230|22.8|  4|140.8| 95|3.92| 3.15| 22.9|  1|  0|   4|   2|\n",
      "|   17|      Fiat 128|32.4|  4| 78.7| 66|4.08|  2.2|19.47|  1|  1|   4|   1|\n",
      "|   18|   Honda Civic|30.4|  4| 75.7| 52|4.93|1.615|18.52|  1|  1|   4|   2|\n",
      "|   19|Toyota Corolla|33.9|  4| 71.1| 65|4.22|1.835| 19.9|  1|  1|   4|   1|\n",
      "|   20| Toyota Corona|21.5|  4|120.1| 97| 3.7|2.465|20.01|  1|  0|   3|   1|\n",
      "|   25|     Fiat X1-9|27.3|  4| 79.0| 66|4.08|1.935| 18.9|  1|  1|   4|   1|\n",
      "|   26| Porsche 914-2|26.0|  4|120.3| 91|4.43| 2.14| 16.7|  0|  1|   5|   2|\n",
      "|   27|  Lotus Europa|30.4|  4| 95.1|113|3.77|1.513| 16.9|  1|  1|   5|   2|\n",
      "|   31|    Volvo 142E|21.4|  4|121.0|109|4.11| 2.78| 18.6|  1|  1|   4|   2|\n",
      "+-----+--------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "mtcars_df.filter(mtcars_df.cyl == 4).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example 2: select rows when **vs = 1 and am = 1**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the filtering is based on multiple **conditions** (e.g., **vs = 1** and **am = 1**), we use the conditions to build a new **boolean type column**. And we filter the DataFrame by the new column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pyspark.sql import functions as F"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style=\"color:red\">Warning: when passing multiple conditions to the **`when()`** function, each condition has to be within a pair of parentheses</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Column<b'CASE WHEN ((vs = 1) AND (am = 1)) THEN 1 END AS `filter_col`'>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filtering_column = F.when((mtcars_df.vs == 1) & (mtcars_df.am == 1), 1).name('filter_col')\n",
    "filtering_column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we need to add the new column to the original DataFrame. **This can be done by applying the `select()` function to select all original columns as well as the new filtering columns.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Column<b'index'>,\n",
       " Column<b'model'>,\n",
       " Column<b'mpg'>,\n",
       " Column<b'cyl'>,\n",
       " Column<b'disp'>,\n",
       " Column<b'hp'>,\n",
       " Column<b'drat'>,\n",
       " Column<b'wt'>,\n",
       " Column<b'qsec'>,\n",
       " Column<b'vs'>,\n",
       " Column<b'am'>,\n",
       " Column<b'gear'>,\n",
       " Column<b'carb'>]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_original_columns = [eval('mtcars_df.' + c) for c in mtcars_df.columns]\n",
    "all_original_columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Column<b'index'>,\n",
       " Column<b'model'>,\n",
       " Column<b'mpg'>,\n",
       " Column<b'cyl'>,\n",
       " Column<b'disp'>,\n",
       " Column<b'hp'>,\n",
       " Column<b'drat'>,\n",
       " Column<b'wt'>,\n",
       " Column<b'qsec'>,\n",
       " Column<b'vs'>,\n",
       " Column<b'am'>,\n",
       " Column<b'gear'>,\n",
       " Column<b'carb'>,\n",
       " Column<b'CASE WHEN ((vs = 1) AND (am = 1)) THEN 1 END AS `filter_col`'>]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_columns = all_original_columns + [filtering_column]\n",
    "all_columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+-------------------+----+---+-----+---+----+-----+-----+---+---+----+----+----------+\n",
      "|index|              model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|filter_col|\n",
      "+-----+-------------------+----+---+-----+---+----+-----+-----+---+---+----+----+----------+\n",
      "|    0|          Mazda RX4|21.0|  6|160.0|110| 3.9| 2.62|16.46|  0|  1|   4|   4|      null|\n",
      "|    1|      Mazda RX4 Wag|21.0|  6|160.0|110| 3.9|2.875|17.02|  0|  1|   4|   4|      null|\n",
      "|    2|         Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|         1|\n",
      "|    3|     Hornet 4 Drive|21.4|  6|258.0|110|3.08|3.215|19.44|  1|  0|   3|   1|      null|\n",
      "|    4|  Hornet Sportabout|18.7|  8|360.0|175|3.15| 3.44|17.02|  0|  0|   3|   2|      null|\n",
      "|    5|            Valiant|18.1|  6|225.0|105|2.76| 3.46|20.22|  1|  0|   3|   1|      null|\n",
      "|    6|         Duster 360|14.3|  8|360.0|245|3.21| 3.57|15.84|  0|  0|   3|   4|      null|\n",
      "|    7|          Merc 240D|24.4|  4|146.7| 62|3.69| 3.19| 20.0|  1|  0|   4|   2|      null|\n",
      "|    8|           Merc 230|22.8|  4|140.8| 95|3.92| 3.15| 22.9|  1|  0|   4|   2|      null|\n",
      "|    9|           Merc 280|19.2|  6|167.6|123|3.92| 3.44| 18.3|  1|  0|   4|   4|      null|\n",
      "|   10|          Merc 280C|17.8|  6|167.6|123|3.92| 3.44| 18.9|  1|  0|   4|   4|      null|\n",
      "|   11|         Merc 450SE|16.4|  8|275.8|180|3.07| 4.07| 17.4|  0|  0|   3|   3|      null|\n",
      "|   12|         Merc 450SL|17.3|  8|275.8|180|3.07| 3.73| 17.6|  0|  0|   3|   3|      null|\n",
      "|   13|        Merc 450SLC|15.2|  8|275.8|180|3.07| 3.78| 18.0|  0|  0|   3|   3|      null|\n",
      "|   14| Cadillac Fleetwood|10.4|  8|472.0|205|2.93| 5.25|17.98|  0|  0|   3|   4|      null|\n",
      "|   15|Lincoln Continental|10.4|  8|460.0|215| 3.0|5.424|17.82|  0|  0|   3|   4|      null|\n",
      "|   16|  Chrysler Imperial|14.7|  8|440.0|230|3.23|5.345|17.42|  0|  0|   3|   4|      null|\n",
      "|   17|           Fiat 128|32.4|  4| 78.7| 66|4.08|  2.2|19.47|  1|  1|   4|   1|         1|\n",
      "|   18|        Honda Civic|30.4|  4| 75.7| 52|4.93|1.615|18.52|  1|  1|   4|   2|         1|\n",
      "|   19|     Toyota Corolla|33.9|  4| 71.1| 65|4.22|1.835| 19.9|  1|  1|   4|   1|         1|\n",
      "+-----+-------------------+----+---+-----+---+----+-----+-----+---+---+----+----+----------+\n",
      "only showing top 20 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "new_mtcars_df = mtcars_df.select(all_columns)\n",
    "new_mtcars_df.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can filter the DataFrame by the requested conditions. After we filter the DataFrame, we can drop the filtering column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-----+--------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|index|         model| mpg|cyl| disp| hp|drat|   wt| qsec| vs| am|gear|carb|\n",
      "+-----+--------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "|    2|    Datsun 710|22.8|  4|108.0| 93|3.85| 2.32|18.61|  1|  1|   4|   1|\n",
      "|   17|      Fiat 128|32.4|  4| 78.7| 66|4.08|  2.2|19.47|  1|  1|   4|   1|\n",
      "|   18|   Honda Civic|30.4|  4| 75.7| 52|4.93|1.615|18.52|  1|  1|   4|   2|\n",
      "|   19|Toyota Corolla|33.9|  4| 71.1| 65|4.22|1.835| 19.9|  1|  1|   4|   1|\n",
      "|   25|     Fiat X1-9|27.3|  4| 79.0| 66|4.08|1.935| 18.9|  1|  1|   4|   1|\n",
      "|   27|  Lotus Europa|30.4|  4| 95.1|113|3.77|1.513| 16.9|  1|  1|   5|   2|\n",
      "|   31|    Volvo 142E|21.4|  4|121.0|109|4.11| 2.78| 18.6|  1|  1|   4|   2|\n",
      "+-----+--------------+----+---+-----+---+----+-----+-----+---+---+----+----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "new_mtcars_df.filter(new_mtcars_df.filter_col == 1).drop('filter_col').show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
